Corpus
Video Lectures
For Developers
You can also see Java, Python,
Cython, Swift,
C++, or C# repository.
Requirements
Node.js
To check if you have a compatible version of Node.js installed, use the following command:
node -v
You can find the latest version of Node.js here.
Git
Install the latest version of Git.
Npm Install
npm install nlptoolkit-corpus
Download Code
In order to work on code, create a fork from GitHub page.
Use Git for cloning the code to your local or below line for Ubuntu:
git clone <your-fork-git-link>
A directory called util will be created. Or you can use below link for exploring the code:
git clone https://github.com/starlangsoftware/corpus-js.git
Open project with Webstorm IDE
Steps for opening the cloned project:
- Start IDE
- Select File | Open from main menu
- Choose
Corpus-Js
file - Select open as project option
- Couple of seconds, dependencies will be downloaded.
Detailed Description
Corpus
To store a corpus in memory
a = Corpus("derlem.txt");
If this corpus is split with dots but not in sentences
constructor(fileName: string = undefined, splitterOrChecker: SentenceSplitter = undefined)
To eliminate the non-Turkish sentences from the corpus
constructor(fileName: string = undefined, splitterOrChecker: LanguageChecker = undefined)
The number of sentences in the corpus
sentenceCount(): number
To get ith sentence in the corpus
getSentence(index: number): Sentence
TurkishSplitter
TurkishSplitter class is used to split the text into sentences in accordance with the . rules of Turkish.
split(line: string): Array<Sentence>